1 Third Year Report
نویسنده
چکیده
The aim of my PhD research is focused on Text Mining, one major research school in Knowledge Discovery in Databases (KDD), and in particular language-independent Documentbase Pre-processing (DPP) for classification / categorisation of documents, noted as Text Classification (TC), using novel algorithms for the identification of hidden patterns, rules, regularities and/or trends within these documents. Significant techniques in Data Mining, the classical research school in KDD that parallels to Text Mining, are involved to support this research, especially when dealing with a very large documentbase, such as Classification Rule Mining (CRM), Association Rule Mining (ARM), etc. One possible way to understand the framework of TC is to split it as DPP plus CRM. When applying Classification Association Rule Mining (CARM), a well established intersection between CRM and ARM, in TC, (1) the large volume of textual data (i.e., a given documentbase usually consists of more than 10,000 documents, where each document contains hundreds of words) can be handled, and (2) it is possible to deal with a small number of noisy data (i.e., a number of misspelling and/or cross-language words in documents). Based on (2), this approach can be identified as language-insensitive or language-independent in some angles, while acknowledging that such alternative approaches exist. With regards to the CARM based approach, developing a TC oriented Language-independent DPP (TC-LI-DPP) approach will consequently result an “almighty” TC approach, which have a general applicability regardless of the language(s) in which the documentbase to be classified are presented. In this report (a report that details what have been done within the third year of my PhD research), a number of language-independent DPP techniques, to support single-label Nclass TC, are described and compared. The discussion focuses on the vector space / “bag-of-*” model while acknowledging that alternative approaches to languageindependent DPP exist. A simple but effective statistical key / significant word identification approach is proposed which in turn is coupled with a number of phrase identification mechanisms. The emphasis in all cases is on language-independence so that the techniques described have a general applicability regardless of the language(s) in which the documentbase to be mined are presented. PDF created with pdfFactory Pro trial version www.software-partners.co.uk
منابع مشابه
Unorthodox Change in the Angulation of an Impacted Mandibular Third Molar: A rare Case Report
Background: Third molars are the most frequently impacted teeth, and extensive research has been carried out delineating their impaction prevalence, classification, and treatment approaches. We present a rare case of an impacted mandibular third molar which went through unprecedented changes in angulation over an eight-year time span with no particular pathologic, traumatic, or therapeutic inte...
متن کاملAutotransplantation of a mandibular third molar: A case report
Tooth autotransplantation defines as transition of one tooth from one position to another, in same individual. It is a biological procedure in which teeth have the potential to induce alveolar bone growth. It can be applied in patients before adolescence growth is finished. It significantly reduces time and cost compared to implants. Healing rapidly occurs and function is regained almost immedi...
متن کاملMandibular angle fracture following closed extraction of lower third molar: A case report and systematic review
Objectives Mandibular third molar extraction is among the most commonly performed dental procedures. Fracture of the angle of mandible after third molar extraction is a rare complication of this procedure. Case Herein, we report fracture of the right angle of mandible immediately after extraction of mandibular right third molar in a 38-year old healthy female patient, which was surgically mana...
متن کاملBrooke-Spiegler Syndrome: a case report
Brooke-Spiegler syndrome is a rare autosomal recessive disease characterized by adnexal neoplasms, particularly trichoepithelioma, cylindroma, and occasionally spiradenoma, which usually develop in second to third decades of life. We report this syndrome in a 16-year-old woman with tumors on face and scalp.
متن کاملSimultaneous of Mid Third Clavicle Fracture and Type 3 Acromioclavicular Joint Dislocation; A Case Report
Simultaneous mid third clavicle fracture and acromioclavicular joint dislocation is a rare combination injury, as a result of high-energy trauma. We report a patient with a middle third clavicle fracture and ipsilateral grade three-acromioclavicular joint dislocation, which is a rare combination. The patient wanted to get back to work as soon as possible, so the fracture was fixed with recons...
متن کاملBilateral Dentigerous Cysts in a Non-Syndromic Patient: Literature Review and Report of a Case
Introduction: Dentigerous cysts (DCs) are the most common developmental cysts of the jaws, mostly associated with impacted third molars and canines. Multiple or bilateral DCs are rare and typically occur in association with some syndromes including cleidocranial dysplasia and Gorlin-Goltz. The occurrence of multiple DCs is rare in the absence of these syndromes. Case Presentation: A 28-year-ol...
متن کامل